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Abstract 

In  a  recent  experiment,  Freedman  et  al.  recorded  from  inferotemporal  (IT)  and  prefrontal  cortices  (PFC) 
of  monkeys  performing  a  "caf / dog"  cafegorizafion  fask  ([3]  and  Freedman,  Riesenhuber,  Poggio,  Miller, 
Soc.  Neurosci.  Abs.).  In  fhis  paper  we  analyze  fhe  funing  properties  of  view-funed  unifs  in  our  HMAX 
model  of  objecf  recognition  in  corfex  [7,  8]  using  fhe  same  paradigm  and  stimuli  as  in  fhe  experimenf.  We 
fhen  compare  fhe  simulafion  resulfs  fo  fhe  monkey  inferofemporal  neuron  population  dafa.  We  find  fhaf 
view-funed  model  IT  unifs  fhaf  were  framed  wifhouf  any  explicif  cafegory  informafion  can  show  cafegory- 
relafed  funing  as  observed  in  fhe  experimenf.  This  suggesfs  fhaf  fhe  funing  properfies  of  experimenfal  IT 
neurons  mighf  primarily  be  shaped  by  boffom-up  sfimulus-space  sfafisfics,  wifh  liffle  influence  of  fop- 
down  fask-specific  informafion.  The  population  of  experimenfal  PFC  neurons,  on  fhe  ofher  hand,  shows 
funing  properfies  fhaf  cannof  be  explained  jusf  by  stimulus  funing.  These  analyses  are  compafible  wifh  a 
model  of  objecf  recognifion  in  corfex  [10]  in  which  a  population  of  shape-funed  neurons  provides  a  general 
basis  for  neurons  funed  fo  differenf  recognifion  fasks. 
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1  Introduction 


In  [10],  Riesenhuber  and  Poggio  proposed  a  model  of 
object  recognition  in  cortex  in  which  a  general  repre¬ 
sentation  of  objects  in  inferotemporal  cortex  (IT)  pro¬ 
vides  the  basis  for  different  recognition  tasks  —  such 
as  identification  and  categorization  —  with  task-related 
units  located  further  downstream,  e.g.,  in  prefrontal 
cortex  (PFC).  Freedman  and  Miller  recently  performed 
physiology  experiments  providing  experimental  popu¬ 
lation  data  for  both  PFC  and  IT  of  a  monkey  trained  on 
a  "cat/ dog"  categorization  task  ([2,  3]  and  Freedman, 
Riesenhuber,  Poggio,  Miller,  Soc.  Neurosci.  Abs.,  2001). 
In  this  paper,  using  the  same  stimuli  as  in  the  experi¬ 
ment,  we  analyze  the  properties  of  view-tuned  units  in 
our  model,  trained  without  any  explicit  category  infor¬ 
mation,  and  compare  them  to  the  tuning  properties  of 
experimental  IT  and  PFC  neurons. 

2  Methods 


categorization  unit 
(supervised  training) 


view-tuned  units 
(unsupervised  training) 

C2  units 

S2  units 

Cl  units 

SI  units 


-  weighted  sum 

MAX 


2.1  The  HMAX  model 

We  used  the  hierarchical  object  recognition  system  of 
Riesenhuber  &  Poggio  [7,  8],  shown  schematically  in 
Fig.  1.  It  consists  of  a  hierarchy  of  layers  with  lin¬ 
ear  units  performing  template  matching,  and  non-linear 
units  performing  a  "MAX"  operation.  This  MAX  oper¬ 
ation,  selecting  the  maximum  of  a  cell's  inputs  and  us¬ 
ing  it  to  drive  the  cell,  is  key  to  achieving  invariance 
to  translation,  by  pooling  over  afferents  tuned  to  differ¬ 
ent  positions,  and  scale,  by  pooling  over  afferents  tuned 
to  different  scales.  The  template  matching  operation, 
on  the  other  hand,  increases  feature  specifity.  A  cas¬ 
cade  of  these  two  operations  leads  to  C2  units  (roughly 
corresponding  to  V4/PIT  neurons),  which  are  tuned  to 
complex  features  invariant  to  changes  in  position  and 
scale.  The  outputs  of  these  units  (or  a  subset  thereof) 
are  used  as  inputs  to  the  view-tuned  units  (correspond¬ 
ing  to  view-tuned  neurons  in  IT  [5,  8]),  which  in  turn 
can  provide  input  to  units  trained  on  various  recogni¬ 
tion  tasks,  for  instance  cat/ dog  categorization  (for  the 
appropriate  simulations,  see  [9]). 

2.2  Stimulus  space 

The  stimulus  space  is  spanned  by  six  protot5^e  objects, 
three  "cats"  and  three  "dogs"  (cf.  Fig.  2).  Our  mor¬ 
phing  software  [11]  allows  us  to  generate  3D  objects 
that  are  arbitrary  combinations  of  the  six  prototypes. 
Each  object  is  defined  by  a  six-dimensional  morph  vec¬ 
tor,  with  the  value  in  each  dimension  corresponding  to 
the  relative  proportion  of  one  of  the  protot5^es  present 
in  the  object.  The  component  sum  of  each  object  was 
constrained  to  be  equal  to  one.  An  object  was  labeled 
a  "cat"  or  "dog"  depending  on  whether  the  sum  over 
the  "cat"  prototypes  in  its  morph  vector  was  greater  or 
smaller  than  those  over  the  "dog"  prototypes,  resp.  The 


Figure  1:  Scheme  of  the  HMAX  model.  Feature  specifity 
and  invariance  to  translation  and  scale  are  gradually 
built  up  by  a  hierarchy  of  "S"  and  "C"  layers  [4],  resp. 
The  C2  layer,  consisting  of  units  tuned  to  complex  fea¬ 
tures  invariant  to  changes  in  position  and  scale,  feeds 
directly  into  the  view-tuned  units,  which  in  turn  can 
provide  input  to  recognition  task-specific  units,  such  as 
a  cat /dog  categorization  unit,  as  shown  (see  [9]). 


Figure  2:  Illustration  of  the  cat/ dog  stimulus  space.  The 
morph  space  is  spanned  by  the  pictures  of  three  cats 
shown  on  top  ("house  cat",  "Cheetah"  and  "Tiger")  and 
the  three  dogs  below  ("house  dog",  "Doberman"  and 
"German  Shepherd").  All  protot5rpes  have  been  nor¬ 
malized  with  respect  to  viewing  angle,  lighting  param¬ 
eters,  size  and  color. 
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a=32 


a=256 


class  boundary  was  defined  by  the  set  of  objects  hav¬ 
ing  morph  vectors  with  equal  cat  and  dog  component 
sums.  The  lines  in  Fig.  2  show  the  nine  possible  morph 
lines  between  two  protot5q)es,  one  of  each  class,  as  used 
in  the  test  set  (see  below). 

Training  set.  The  training  set  (a  subset  of  the  stimuli 
used  to  train  the  monkeys  in  [2, 3])  consisted  of  144  ran¬ 
domly  selected  morphed  animal  stimuli  not  restricted 
to  these  morph  lines  [2],  but  chosen  at  random  from  the 
cat/ dog  morph  space,  excluding  "cats"  ("dogs")  with 
a  "dog"  ("cat")  component  greater  than  40%  (as  in  the 
experiment). 

Test  set.  The  testing  set  used  to  determine  an  exper¬ 
imental  neuron's  or  model  unit's  category  tuning  con¬ 
sisted  of  the  nine  lines  through  morph  space  connecting 
one  protot5q)e  of  each  class.  Each  morph  line  was  sub¬ 
divided  into  10  intervals,  with  the  exclusion  of  the  stim¬ 
uli  at  the  mid-points  (which  would  lie  right  on  the  class 
boundary,  with  an  undefined  label),  yielding  a  total  of 
78  stimuli. 

2.3  Learning  a  class  representation 

One  view-tuned  unit  (VTU),  connected  to  all  or  a  sub¬ 
set  of  C2  units,  was  allocated  for  each  training  stimu¬ 
lus,  yielding  144  view-tuned  units.*  The  two  parame¬ 
ters  affecting  the  tuning  characteristics  of  the  VTUs  are 
the  number  of  afferent  C2  units  a  (sorted  by  decreasing 
strength  [7])  and  the  Gaussian  tuning  width  a.  Exper¬ 
iments  were  run  using  8,  32,  128  and  256  afferents  to 
each  VTU  and  a  values  of  0.1,  0.2,  0.4,  0.8  and  1.6,  re¬ 
spectively.  Eor  the  sake  of  clarity  we  will  present  only 
four  of  those  20  combinations  ((a  =  32,  cr  =  0.1);  (a  = 
32,  CT  =  0.2);  (a  =  256,  cr  =  0.1)  and  (a  =  256,  cr  =  0.2)). 
Using  8  afferents  produced  units  whose  tuning  was  too 
unspecific,  while  cr  values  above  0.2  yielded  unrealisti¬ 
cally  broad  tuning. 

2.4  Evaluating  category  tuning 

We  use  three  measures  to  characterize  the  category- 
related  behavior  of  experimental  neurons  and  model 
units:  the  between-within  index  (BWI),  the  class  cov¬ 
erage  index  (CCI)  and  the  receiver  operating  character¬ 
istics  (ROC). 

BWI  The  between-within  index  (BWI)  [2, 3]  is  a  measure 
for  tuning  at  the  class  boundary  relative  to  the  class 
interior.  Considering  the  response  of  a  unit  to  stim¬ 
uli  along  one  morph  line,  the  response  difference  be¬ 
tween  two  adjacent  stimuli  can  be  calculated.  As  there 
is  no  stimulus  directly  on  the  class  boundary,  we  use 
20%  steps  for  calculating  the  response  differences.  Let 
btw  be  the  mean  response  difference  between  the  two 
categories  (f.  e.,  between  morph  index  0.4  and  0.6)  and 

*  Results  were  similar  for  32  VTUs  obtained  from  144  stim¬ 
uli  through  k-means  clustering  [9]. 
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Eigure  3:  Number  of  view-tuned  units  tuned  to  stim¬ 
uli  at  certain  morph  indices  for  different  numbers  of  af¬ 
ferents.  The  morph  index  is  the  percentage  of  the  dog 
protot5q)e  in  the  stimulus,  e.g.,  morph  index  0.4  corre¬ 
sponds  to  a  morphed  stimulus  which  is  40%  dog  and 
60%  cat. 


wi  the  mean  response  difference  within  the  categories. 
Then  the  between-within  index  is 


BWI  = 


btw  —  wi 
btw  wi 


(1) 


Thus,  the  range  of  BWI  values  is  —1  to  -1-1.  Eor  a  BWI 
of  zero  the  unit  shows  on  average  no  different  behavior 
at  the  boundary  compared  to  the  class  interiors.  Pos¬ 
itive  BWI  values  indicate  a  significant  response  drop 
across  the  border  (e.  g.,  for  units  differentiating  between 
classes)  whereas  negative  values  are  characteristic  for 
units  which  show  response  variance  within  the  classes 
but  not  across  the  boundary. 

CCI  The  class  coverage  index  (CCI)  [2]  is  the  proportion 
of  stimuli  in  the  unit's  preferred  category  that  evoke  re¬ 
sponses  higher  than  the  maximum  response  to  stimuli 
from  the  other  category.  Possible  values  range  from 
meaning  out  of  the  39  stimuli  in  the  class  only  the  maxi¬ 
mum  itself  evokes  a  higher  response  than  the  maximum 
in  the  other  class,  to  1  for  full  class  coverage,  i.  e.,  perfect 
separability. 

ROC  The  receiver  operating  characteristics  (ROC)  curve 
[6]  shows  the  categorization  performance  of  a  unit  in 
terms  of  correctly  categorized  preferred-class  stimuli 
(hits)  vs.  miscategorized  stimuli  from  the  other  class 
(false  alarms).  The  area  under  the  ROC  curve  is  a  mea¬ 
sure  of  the  quality  of  categorization.  A  value  of  0.5  cor¬ 
responds  to  chance  performance,  1  means  perfect  sep¬ 
arability,  i.  e.,  perfect  categorization  performance.  ROC 
values  were  obtained  by  fixing  the  activation  threshold 
and  counting  hits  and  false  alarms  using  this  threshold. 
Activation  values  for  all  stimuli  were  used  as  thresholds 
to  obtain  ROC  curves  as  detailed  as  possible.  The  ROC 

^  When  comparing  model  units  and  experimental  neurons, 
CCI  values  were  calculated  using  the  42  stimuli  used  in  the 
experiment  (see  section  4),  so  the  minimum  CCI  value  was 
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area  values  were  computed  using  trapezoidal  numeri¬ 
cal  integration  of  the  ROC  curve. 

3  Results 

3.1  VTU  positions 

We  defined  fhe  posifion  of  a  VTU  as  fhe  posifion  of  fhe 
sfimulus  along  fhe  nine  morph  lines  fhaf  maximizes  fhe 
VTU's  response.  This  approach  yields  an  asymmefric 
disfribufion  of  VTUs  over  fhe  sfimulus  space  (Fig.  3). 
As  fhe  framing  sef  consisfed  of  72  cafs  and  72  dogs,  fhis 
asymmefry  suggesfs  fhaf  some  morphed  "cafs"  look 
similar  fo  "dogs"  in  fhe  space  of  C2  acfivafions. 

3.2  Shape  tuning 

The  response  function  over  the  stimulus  space  of  a 
view-funed  unif  is  a  Gaussian  cenfered  af  fhe  unif's  pre¬ 
ferred  sfimulus,  dropping  exponentially  fasf  (depend¬ 
ing  on  fhe  unif's  sfandard  deviafion  a)  in  all  direc¬ 
tions.  Thus,  fhe  VTU  will  respond  fo  every  sfimulus 
presenfed,  however  fhere  will  be  a  significanf  response 
fo  only  some  of  fhose  sfimuli. 

Fig.  4  shows  fhe  response  fo  fhe  sfimuli  along  fhe  nine 
morph  lines  of  a  VTU  funed  close  fo  fhe  80%  caf  sfim¬ 
ulus  on  fhe  fiffh  line  (connecfing  Cheefah  and  Dober¬ 
man).  For  small  a,  fhe  funing  is  much  fighfer  around 
fhis  maximum,  showing  only  liffle  response  fo  sfimuli 
involving  ofher  caf  profof5rpes.  Wifh  32  afferenfs  and  a 
a  of  0.1  fhis  unif's  behavior  could  be  described  as  caf- 
egorizing  cafs  vs.  dogs  along  fhe  morph  lines  involv¬ 
ing  caf  profof5rpe  2.  Wifh  256  afferenfs  fhe  funing  is 
even  fighfer  because  fhe  sfimulus  is  specified  in  a  256- 
dimensional  space  insfead  of  a  32-dimensional  one.  Be¬ 
cause  fhe  unif  is  funed  fo  a  randomly  generafed  sfimu¬ 
lus  nof  necessarily  lying  on  a  morph  line,  fhere  is  hardly 
any  response  for  a  =  256  and  ct  =  0.1. 

3.3  Category  tuning 
3.3.1  BWI 

The  between-within  index  is  a  measure  for  response 
changes  af  fhe  class  boundary  vs.  fhe  class  inferiors.  For 
a  given  paramefer  sef,  all  unifs  share  fhe  same  response 
function  shape,  only  varying  in  fhe  location  of  fheir  pre¬ 
ferred  sfimulus  in  morph  space.  Thus,  fhe  major  re¬ 
sponse  decay  of  profofype-funed  unifs  will  be  wifhin 
fheir  class,  showing  no  observable  response  fo  sfimuli 
near  fhe  border  or  in  fhe  ofher  class,  yielding  nega- 
five  BWI  values.  Border-funed  unifs  show  a  subsfan- 
fial  drop  in  response  over  fhe  boundary  wifh  a  much 
lower  mean  difference  wifhin  class  resulting  in  a  posi- 
five  befween-wifhin  index.  As  shown  in  Fig.  5,  for  unifs 
wifh  preferred  sfimuli  af  differenf  posifions  along  each 
morph  line,  unifs  funed  wifh  256  afferenfs  show  exacfly 
fhis  behavior.  However,  when  using  only  32  afferenf 
VTUs  fhe  values  fend  fo  be  closer  fo  zero.  This  is  due  fo 
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Figure  4:  Grayscale  plof  of  a  VTU  response  along 
fhe  nine  morph  lines.  Each  horizonfal  line  represenfs 
one  cross-boundary  morph  line.  The  verfical  middle 
line  is  inserfed  fo  visually  separafe  caf  (leff)  and  dog 
(righf)sfimuli.  Caf  profof57pes  (Cl,  C2,  C3)  are  plotted  fo 
fhe  leff  of  fhe  boundary,  dog  profof57pes  (Dl,  D2,  D3)  fo 
fhe  righf.  The  columns  in  between  correspond  fo  mor¬ 
phed  sfimuli  in  10%  sfeps.  A  color  scale  indicafes  fhe 
response  level. 
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Figure  5:  Mean  befween-wifhin  index  (BWI)  of  view- 
funed  unifs  wifh  preferred  sfimuli  spaced  along  fhe 
morph  lines.  Error  bars  show  sfandard  deviafion  across 
fhe  nine  morph  lines. 
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Figure  6:  Histograms  of  between-within  index  (BWI) 
values  for  the  144  model  VTUs.  The  dashed  line  in¬ 
dicates  the  mean  between-within  index  over  all  view- 
tuned  units. 


the  less  precise  tuning  of  those  VTUs  (cf .  Fig  4),  yielding 
smaller  absolute  values  of  the  BWI. 

The  histograms  of  BWI  values  (Fig.  6)  for  the  144 
units  reflect  this  fact.  The  broader  the  VTU  tuning  gets 
with  decreasing  a  and  increasing  a,  the  tighter  the  dis¬ 
tribution  of  BWI  values  is  centered  around  zero.  For  256 
afferents,  there  is  a  significant  shift  of  the  whole  distri¬ 
bution  towards  negative  values  {p  <  0.01). 

3.3.2  CCI 

The  class  coverage  index  does  not  depend  on  a  be¬ 
cause  changing  a  will  change  the  width  of  the  Gaussian 
but  not  its  shape  or  position.  Clearly,  the  CCI  of  a  VTU 
is  dependent  on  the  position  of  its  preferred  stimulus 
in  morph  space.  Units  tuned  to  stimuli  near  the  class 
boundary  will  have  lower  CCI  values  because  the  re¬ 
sponse  level  to  stimuli  on  the  other  side  of  the  border 
will  be  quite  high.  The  class  coverage  index  for  units 
tuned  to  stimuli  near  the  center  of  a  class  (e.  g.,  at  morph 
line  positions  0.2  and  0.8)  will  be  higher,  as  the  maxi¬ 
mum  response  to  an  other-class  stimulus  will  be  lower 
because  of  the  bigger  distance  from  the  center  of  tun¬ 
ing  to  the  class  boundary.  Units  tuned  to  the  prototype 
stimuli  will  have  smaller  CCI  values  again  due  to  the 
visual  dissimilarity  of  the  prototypes  (cf.  Fig.  7).  As  can 
be  seen  from  Fig.  2,  the  visual  appearance  of  prototypes 
of  the  same  class  is  quite  different.  Thus,  e.  g.,  the  dis¬ 
tance  in  the  space  of  C2  activations  from  the  Tiger  proto¬ 
type  to  the  morphed  stimulus  which  is  only  40%  Tiger 
and  60%  Doberman  is  smaller  than  the  distance  from 
the  Tiger  protot5rpe  to  the  Cheetah  protot5rpe.  As  indi- 
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Figure  7:  Mean  class  coverage  index  (CCI)  of  view- 
tuned  units  with  preferred  stimuli  spaced  along  the 
morph  lines.  Error  bars  show  standard  deviation  across 
the  nine  morph  lines. 
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Figure  8:  Histograms  of  class  coverage  index  (CCI)  val¬ 
ues.  The  dashed  line  indicates  the  mean  class  coverage 
index  over  all  view-tuned  units. 


cated  by  the  error  bars,  there  is  a  wide  variance  for  the 
class  coverage  index  at  one  position  on  a  morph  line. 

As  can  be  seen  in  Fig.  8,  the  VTU's  CCI  values  are 
shifted  towards  zero  when  increasing  the  number  of  af¬ 
ferents,  since  the  more  specific  tuning  will  emphasize 
the  dissimilarity  of  protot5rpes  of  the  same  class.  For 
a  =  32,  there  are  some  units  with  CCI  values  of  0.4  and 
above.  This  means  those  units  show  a  response  behav¬ 
ior  similar  to  categorization  in  certain  parts  of  stimulus 
space.  Fig.  9  shows  the  response  of  a  single  VTU  along 
the  nine  morph  lines.  With  the  unit's  category  thresh¬ 
old  at  the  indicated  position  the  categorization  perfor¬ 
mance  is  85%. 

3.3.3  ROC 

The  CCI  value  corresponds  (up  to  a  factor)  to  the 
number  of  stimuli  that  evoke  responses  higher  than 
the  maximum  other-class  stimulus.  Thus,  this  value  is 
equivalent  to  the  number  of  correctly  categorized  stim¬ 
uli  with  no  false  alarms  {i.  e.,  no  other-class  stimulus 
being  miscategorized),  which  is  the  initial  point  of  the 
ROC  curve.  Fig.  11  shows  the  distribution  of  Aj^qq  val¬ 
ues  for  different  numbers  of  afferents  (as  changing  a 
only  affects  the  response  magnitude  to  individual  stim¬ 
uli  but  does  not  change  their  ranking,  Ajiqc  is  inde¬ 
pendent  of  (t).  For  256  afferents,  about  half  of  the  units 
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Figure  9:  Response  of  a  view-tuned  unit  with  max¬ 
imum  response  to  an  80%  dog  stimulus  (one  of  144 
VTUs,  32  afferents  to  each  VTU,  a  =  0.1)  along  the 
nine  morph  lines.  The  dashed  line  shows  the  average 
over  all  morph  lines.  The  solid  horizontal  line  shows 
a  possible  class  boundary  yielding  best  categorization 
performance. 
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Figure  10:  values  of  view-tuned  units  with  pre¬ 

ferred  stimuli  spaced  along  the  morph  lines.  Error  bars 
show  standard  deviation  across  the  nine  morph  lines. 

have  values  over  0.6  with  a  maximum  of  0.74.  For  32  af¬ 
ferents  about  15%  of  the  VTUs  have  an  Aj^qq  value  of 
more  than  0.8  up  to  0.94.  This  clearly  shows  that  there 
is  a  substantial  number  of  VTUs  able  to  categorize  with 
a  remarkable  performance,  without  the  benefit  of  any 
category  information  during  training. 

4  Comparison  of  model  and  experiment 

We  compared  the  tuning  properties  of  model  units  to 
those  of  the  IT  and  PFC  neurons  recorded  from  by 
Freedman  [2,  3]  from  two  monkeys  performing  the 
cat/dog  categorization  task.^  In  the  following,  we  re¬ 
strict  our  analysis  to  the  neurons  that  showed  stimulus 
selectivity  by  an  ANOVA  (p  <  0.01),  over  the  42  stim¬ 
uli  along  the  nine  morph  lines  used  in  the  experiment 
(in  the  experiment,  stimuli  were  located  at  positions  0, 
0.2,  0.4,  0.6,  0.8,  and  1  along  each  morph  line).  Thus, 
we  only  analyzed  those  neurons  that  responded  signif- 

^The  monkeys  had  to  perform  a  delayed  match-to- 
category  task.  The  first  stimulus  was  shown  for  600ms,  fol¬ 
lowed  by  a  Is  delay  and  the  second,  test,  stimulus.  See  [2,  3] 
for  details. 


Figure  11:  A^qq  values  of  the  view-tuned  units  sorted 
in  ascending  order. 

icantly  differently  to  at  least  one  of  the  stimuli. § 

In  particular,  we  analyzed  a  total  of  116  stimulus- 
selective  IT  neurons  during  the  "sample"  period  (100ms 
to  900ms  after  stimulus  onset).  Only  a  small  num¬ 
ber  of  IT  neurons  responded  selectively  during  the  de¬ 
lay  period.  For  the  PFC  data,  there  were  67  stimulus- 
selective  neurons  during  the  sample  period,  and  32 
stimulus-selective  neurons  during  the  immediately  fol¬ 
lowing  "delay"  period  (300  to  1100  ms  after  stimulus 
offset,  during  which  the  monkey  had  to  keep  the  cat¬ 
egory  membership  of  the  previously  presented  sample 
stimulus  in  mind,  to  compare  it  to  a  subsequently  (at 
1000  ms  after  stimulus  offset)  presented  test  stimulus 
[3]. 

Figs.  13  through  15  show  the  BWI,  CCI,  and  Apoc 
distributions  for  the  IT  neurons  (during  the  sample  pe¬ 
riod  —  IT  neurons  tended  to  show  much  less  delay  ac¬ 
tivity  than  the  PFC  neurons),  and  the  PFC  neurons  (dur¬ 
ing  sample  and  delay  periods,  resp.).^ 

4.1  IT 

Comparing  the  view-tuned  model  unit  data  to  the  ex¬ 
perimental  IT  data  (Fig.  16  and  Fig.  13),  we  observe  a 
very  good  agreement  of  the  BWI  distributions  of  model 
units  and  IT  neurons:  Both  are  centered  around  zero 
and  show  a  mean  not  significantly  different  from  0.  Fur¬ 
ther,  the  ROC  plots  show  very  similar  means,  and  — 
even  more  importantly  —  identical  maxima  (0.94).  This 
shows  that  high  ROC  values  can  be  obtained  without 
any  explicit  category  information,  and  moreover  that 
the  range  of  ROC  values  of  experimental  IT  neurons  are 
well  compatible  with  those  of  view-tuned  model  units. 
There  do  appear  to  be  some  differences  in  the  distribu¬ 
tion  of  ROC  values,  with  the  experimental  distribution 
having  proportionally  fewer  neurons  with  intermediate 
ROC  values. 

^Extending  the  analysis  to  include  all  responsive  neurons 
(relative  to  baseline,  p  <  0.01)  added  mainly  untuned  neurons 
with  CCIs  close  to  0,  and  Ajioc  values  close  to  0.5. 

^For  comparison  with  the  model,  the  indices  and  ROC 
curves  were  calculated  using  a  neuron's  averaged  firing  rate 
(over  at  least  10  stimulus  presentations)  to  each  stimulus. 
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Figure  12:  Distribution  of  preferred  stimuli  (morph  in¬ 
dices)  for  experimenfal  IT  neurons. 

Differences  in  fhe  CCI  disfribufions  appear  fo  be 
more  subsfanfial.  However,  fhe  highesf  CCI  in  fhe 
model  is  greafer  fhan  fhaf  of  fhe  experimenfal  IT  neu¬ 
rons,  showing  fhaf  model  unifs  can  show  similar  de¬ 
grees  of  cafegory  funing  as  fhe  experimenfal  neurons. 

4.1.1  Noise 

Whaf  could  be  fhe  source  of  fhe  differences  between 
fhe  funing  properfies  of  model  unifs  and  experimen¬ 
fal  neurons?  One  facfor  is  fhe  deferminisfic  response 
of  model  unifs  in  confrasf  fo  fhe  noisy  responses  of  ex¬ 
perimenfal  neurons  which  show  frial-fo-frial  variations 
even  for  the  same  stimulus.  Such  random  fluctuations 
in  a  neuron's  firing  rafe  can  have  a  sfrong  impacf  on 
fhe  CCI  value  of  neurons  wifh  preferred  sfimuli  near 
fhe  class  boundary,  where  sfimuli  belonging  fo  differenf 
classes  produce  similar  responses,  pointing  fo  a  possi¬ 
ble  explanation  for  fhe  high  number  of  neurons  in  fhe 
experiment  with  low  CCI  values. 

Indeed,  adding  independent  Gaussian  noise  to  the  re¬ 
sponses  of  model  unifs  produces  only  modes!  shiffs  in 
fhe  BWI  and  ROC  disfribufions,  buf  leads  fo  a  CCI  dis- 
fribufion  fhaf  is  dominafed  by  unifs  wifh  low  CCI  val¬ 
ues,  as  in  the  experiment  (Fig.  17).  In  the  ROC  value 
distribution,  the  proportion  of  unifs  wifh  infermedi- 
afe  ROC  values  decreased,  producing  a  more  "convex" 
shape  as  in  fhe  experimenf.  In  general,  fhe  agreemenf 
wifh  fhe  experimental  distribution  is  excellent,  BWI  and 
ROC  distributions  are  not  statistically  significantly  dif¬ 
ferent  (jp  >  0.2,  Wilcoxon  rank  sum  test),  and  the  CCI 
distribution  is  only  marginally  different  {p  =  0.06). 

4.1.2  Resampling 

Another  factor  that  might  affect  the  population  tun¬ 
ing  properties  is  the  distribution  of  preferred  sfimuli. 
In  facf,  calculating  the  distribution  of  preferred  sfimuli 
of  fhe  experimenfal  IT  neurons  reveals  a  difference  be¬ 
tween  experimenfal  and  model  populations:  As  Fig.  12 
shows,  almosf  half  of  all  experimenfal  neurons  have 
preferred  sfimuli  af  fhe  class  boundary,  whereas  fhe 
model  unifs  have  a  disfribufion  fhaf  confains  more  neu¬ 
rons  funed  fo  morph  line  cenfers  (Fig.  3). 

This  difference  could  eifher  be  fhe  signafure  of  fask- 


dependenf  influences  on  IT  learning,  or  it  could  be  due 
to  statistics  of  fhe  sfimulus  ensemble,  as  fhe  lafer  sfages 
of  fhe  monkeys'  framing  focussed  on  sfimuli  close  fo  fhe 
boundary  (which  where  mosf  difficulf  for  fhe  monkeys 
fo  learn).  If  will  be  inferesfrng  fo  examine  fhis  quesfion 
more  closely  in  fufure  sfudies  where  sfimulus  exposure 
is  better  confrolled. 

We  invesfigafed  fhe  effecf  of  fhe  disfribufion  of  pre¬ 
ferred  sfimuli  on  fhe  populafion  funing  properties  by 
resampling  from  fhe  populafion  of  144  model  unifs  fo  ob- 
fain  a  populafion  wifh  a  disfribufion  of  preferred  sfim¬ 
uli  as  in  Fig.  12.  Afterwards,  a  populafion  of  116  unifs, 
fhe  size  of  fhe  experimenfal  populafion,  was  drawn 
from  fhose  noisy  unifs  such  fhaf  fhe  disfribufion  of  unifs 
over  fhe  morph  indices  fif  fhe  experimenfal  populafion. 
This  procedure  was  repeafed  for  a  fofal  of  100  f rials  and 
fhe  resulfs  were  averaged  fo  obfain  fhe  correcf  number 
of  values. 

Fig.  18  shows  fhe  populafion  funing  properfies  of 
the  resampled  model  distribution  (with  a  distribution 
of  preferred  sfimuli  as  in  Fig.  12),  chosen  from  fhe  de¬ 
ferminisfic  model  unifs  of  Fig.  16.  Inferesfingly  disfri¬ 
bufions  are  nof  very  differenf  from  fhe  non-resampled 
case,  in  line  wifh  fhe  resulfs  in  Figs.  5,  7,  and  10  fhaf 
show  only  modes!  changes  in  fhe  index  values  for  unifs 
wifh  preferred  sfimuli  af  fhe  border  compared  fo  di- 
recfly  adjacenf  posifions. 

We  invesfigafed  the  combined  effect  of  noise  and 
disfribufion  of  preferred  sfimuli  on  populafion  funing 
properties  by  adding  independent  Gaussian  noise  with 
amplitude  n  to  the  responses  of  model  unifs,  and  resam¬ 
pling  from  fhe  populafion  of  144  model  unifs  fo  obtain 
a  population  with  a  distribution  of  preferred  sfimuli  as 
in  Fig.  12.  In  particular  fhe  morph  indices  of  fhe  noisy 
model  unifs  were  defermined  after  adding  noise  fo  fheir 
response,  fo  allow  for  possible  shiffs  in  fhe  fhe  locafion 
of  fhe  preferred  sfimulus  in  morph  space.  As  before, 
fhis  procedure  was  repeafed  100  times  and  fhe  resulfs 
were  averaged  fo  obfain  fhe  correcf  number  of  values. 

Fig.  19  shows  fhe  populafion  funing  properfies  of  fhe 
resampled  model  disfribufion,  for  neurons  wifh  a  noise 
level  of  n  =  0.08  (a  noise  level  of  n  =  0.1,  as  in  Fig.  17, 
produced  only  slightly  worse  fits  to  the  experimental 
distribution).  We  again  find  very  good  agreemenf  wifh 
fhe  experimenfal  disfribufion,  BWI  and  ROC  disfribu¬ 
fions  are  nof  sfafisfically  significanfly  differenf  (p>  0.1, 
Wilcoxon  rank  sum  fesf),  and  fhe  CCI  disfribufion  is 
only  marginally  differenf  (p  =  0.03). 
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Figure  13:  Experimental  IT  data.  The  plots  show  the  distribution  of  BWI  (left),  CCI  (center)  and  ROC  (right)  area 
values. 
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Figure  14:  Experimental  PEC  data  (sample  period).  The  plots  show  the  distribution  of  BWI,  CCI  and  ROC  area. 
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Figure  15:  Experimenfal  PEC  dafa  (delay  period).  The  plofs  show  fhe  disfribufion  of  BWI,  CCI  and  ROC  area. 
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Figure  16:  Model  IT  dafa  for  a  =  32,  a  =  0.2.  The  plofs  show  fhe  disfribufion  of  BWI,  CCI,  and  ROC  area  values. 


8 


Between-within  index  Class  coverage  index  VTUs 

Figure  17:  Model  IT  data  for  model  units  from  Fig.  16,  with  added  independent  Gaussian  noise  of  amplitude  n  =  0.1. 
The  plots  show  the  distribution  of  BWI,  CCI,  and  ROC.  Values  shown  are  the  average  over  100  trials. 
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Figure  18:  Resampled  model  IT  data,  deterministic  units.  The  plots  show  the  distribution  of  BWI,  CCI  and  ROC 
area.  Values  shown  are  the  average  over  100  trials. 


Figure  19:  Resampled  model  IT  data,  noise  level  0.08.  The  plots  show  the  distribution  of  BWI,  CCI  and  ROC  area. 
Values  shown  are  the  average  over  100  trials. 
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Figure  20:  Fitted  model  IT  data.  The  plots  show  the  distribution  of  BWI,  CCI  and  ROC  area. 
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4.1.3  Fitting 

As  a  further  demonstration  that  model  IT  unit  tun¬ 
ing  is  well  compatible  with  experimental  tuning,  we 
have  investigated  how  well  the  experimental  IT  pop¬ 
ulation  can  be  fitted  using  the  model  view-tuned  units. 
To  this  end,  we  obtained  the  fitted  population  by  se¬ 
lecting,  for  every  cell  found  in  fhe  experimenf,  fhe  besf 
fiffing  model  unif.  This  was  done  simply  by  adding 
and  comparing  fhe  absolufe  difference  of  fhe  BWI,  CCI, 
ROC  area,  and  morph  index  values  for  fhe  experimen- 
fal  neuron  and  each  model  unif,  respecfively  Using  fhis 
procedure  produces  a  model  unif  population  wifh  fun- 
ing  properties  fhaf  are  sfafisfically  nof  differenf  from  fhe 
population  funing  found  for  fhe  experimenfal  IT  neu¬ 
rons  {p  >  0.1  for  all  fhree  of  BWI,  CCI,  and  ROC). 

Thus,  in  summary,  fhe  degree  of  cafegory  funing  of 
experimenfal  IT  neurons  appears  fo  be  very  well  cap- 
fured  by  fhe  populafion  of  view-funed  model  unifs. 
As  model  unifs  were  framed  wifhouf  any  explicif  cafe- 
gory  information,  fhe  agreemenf  of  experimenfal  IT  and 
model  dafa  suggesf  fhaf  fhe  learning  of  IT  neuron  re¬ 
sponse  properties  can  be  understood  as  largely  driven 
by  shape  similarifies  in  inpuf  space,  wifhouf  any  influ¬ 
ence  of  explicif  cafegory  information. 

4.2  Comparison  of  model  units  us.  PFC  neurons 

The  PFC  neurons  show  a  BWI  distribution  with  a  posi¬ 
tive  mean  significantly  different  from  zero  (sample  pe¬ 
riod:  0.09,  delay:  0.15),  combined  wifh  higher  average 
CCI  values  (sample:  0.21,  delay:  0.21),  wifh  single  neu¬ 
rons  reaching  values  as  high  as  0.76  (sample  and  de¬ 
lay).  Unlike  in  fhe  IT  case,  fhis  maximum  value  lies 
oufside  fhe  range  of  CCI  values  of  model  unifs.  More¬ 
over,  a  posifive  average  BWI  of  fhe  magnifude  found 
in  fhe  PFC  dafa  could  only  be  obfained  in  fhe  model 
wifh  a  significanf  number  of  of  border-funed  neurons 
(cf.  Fig.  5).  Such  border-funed  unifs  have  very  low  CCI 
values  (cf.  Fig.  7).  CCI  values  of  PFC  neurons  are  higher 
fhan  fhose  of  IT  neurons,  however.  Thus,  fhe  funing 
properfies  of  PFC  neurons  cannot  be  explained  in  fhe 
model  by  mere  stimulus  funing  alone,  buf  seem  fo  re¬ 
quire  fhe  influence  of  explicif  cafegory  information  dur¬ 
ing  framing. 

5  Discussion 

In  fhis  paper,  we  have  analyzed  fhe  funing  properfies 
of  model  view-funed  unifs  funed  fo  fhe  same  stimuli 
fhaf  were  used  in  a  recenf  experimenf  [2,  3]  in  which 
monkeys  were  framed  on  a  "caf/dog"  cafegorizafion 
fask,  followed  by  recordings  from  fheir  inferofemporal 
and  prefronfal  cortices.  Using  fhe  same  analysis  mefh- 
ods  as  in  fhe  experimenf,  we  found  fhaf  view-funed 
model  unifs  showed  funing  properfies  very  similar  fo 
fhose  of  monkey  IT  neurons.  In  particular,  as  wifh  IT 
cells  in  fhe  experimenf,  we  found  fhaf  some  view-funed 
unifs  showed  "cafegorizafion-like"  behavior,  i.  e.,  very 


high  ROC  values.  Mosf  nofably,  fhis  funing  emerged 
as  a  consequence  of  fhe  shape-funing  of  fhe  view-funed 
unifs,  wifh  no  influence  of  cafegory  information  dur¬ 
ing  framing.  In  confrasf,  fhe  populafion  of  PFC  neurons 
showed  funing  properfies  fhaf  could  nof  be  explained 
by  mere  stimulus  funing.  Rafher,  fhe  simulations  sug¬ 
gesf  fhe  explicif  influence  of  cafegory  information  in  fhe 
developmenf  of  PFC  neuron  funing. 

These  differenf  response  properfies  of  neurons  in  fhe 
fwo  brain  areas,  wifh  IT  neurons  coding  for  sfimulus 
shape  and  PFC  neurons  showing  more  fask-relafed  fun¬ 
ing,  are  compatible  wifh  a  recenf  model  of  objecf  recog- 
nifion  in  corfex  [8,  10]  in  which  a  general  objecf  rep- 
resenfafion  based  on  view-  and  objecf-funed  cells  pro¬ 
vides  a  basis  for  neurons  funed  fo  specific  objecf  recog- 
nifion  fasks,  such  as  cafegorizafion.  This  fheory  is  also 
supported  by  dafa  from  anofher  experimenf  in  which 
differenf  monkeys  were  framed  on  an  idenfificafion  and 
a  cafegorizafion  fask,  respecfively,  using  fhe  same  sfim- 
uli  [1],  and  which  found  no  differences  in  fhe  sfimulus 
represenfafion  by  inferofemporal  neurons  of  fhe  mon¬ 
keys  framed  on  differenf  fasks.  On  fhe  ofher  hand,  a  re¬ 
cenf  experimenf  [12]  reporfed  IT  neuron  funing  empha¬ 
sizing  cafegory-relevanf  feafures  over  non-relevanf  fea- 
fures  (buf  no  explicif  represenfafion  of  fhe  class  bound¬ 
ary,  unlike  in  [3])  in  monkeys  framed  fo  perform  a  caf¬ 
egorizafion  fask.  Furfher  sfudies  comparing  IT  neuron 
funing  before  and  after  framing  on  a  cafegorizafion  fask 
or  even  differenf  fasks  involving  fhe  same  sef  of  stimuli, 
and  sfudies  fhaf  invesfigafe  fhe  possibilify  of  top-down 
modulation  from  higher  areas  {e.g.,  PFC)  during  fask 
execution,  will  be  needed  fo  more  fully  undersfand  fhe 
role  of  fop-down  fask-specific  information  in  shaping 
IT  neuron  funing.  The  presenf  sfudy  demonsfrafed  fhe 
use  of  compufafional  models  fo  mofivafe  and  guide  fhe 
analysis  of  experimenfal  dafa.  Clearly,  fhe  road  ahead 
will  equally  require  a  very  close  interaction  of  experi- 
menfs  and  compufafional  work. 
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